Introduction

The goal of this project is to do exploratory data analysis on the Netflix dataset. Given the wide scope of possibilities in this dataset, it might be interesting to have some business questions to explore and answer, such as:

Loading Data

Data Inspection & Preprocessing

Before proceeding with the Exploratory Data Analysis (EDA), we have some findings:

Important Note: Since our dataset ends in September 2021 it seems 2021 was cut short, hence that year might reflect smaller values than in reality

Exploratory Data Analysis

What type of production is more prevalent in Netflix (tv show vs movie)?

How are productions distributed across the world given their country of origin?

By looking at the plot above we observe a few countries have most of the total productions while there are many countries with few productions. The US has the most productions, followed by India and UK. To get a better insight on how productions are distributed geographically, we will plot the total productions per nation on the world map. Furthermore, while we only plotted the 50 countries with the most productions above, there is a total of 86.

In terms of regions, North America is by far the one with the most productions. While we find most continents seem to have productions for the majority of countries, Africa has only a few countries with productions (with Egypt, Nigeria and SA having the majority). Given Some continents have many small sized countries (case of Europe and Asia to some extent) it might be interesting to group countries by continent to get a better understanding of how production numbers are distributed across the globe.

When it comes to total productions, North America leads by a margin of 60.2% compared to Asia and 148.73% compared to Europe.

An interesting finding regarding the distribution of productions is the fact North America is mostly made of the US, whereas Asia and Europe have productions from many different countries, some of them which we might consider fairly small sized but have a strong impact in the total number of productions.

Despite the low density of productions in European and Asian countries (with the exception of the UK and India) these two continents still have a considerable sum of total productions. Continents from the Southern Hemisphere have low total productions, with a handful of nations being responsible for most of the production.

How are productions distributed in terms of release year?

We observe most productions have been released in the late 2010s and close to the year of this dataset (2021). The number of production released decreases drastically before 2010. To further Corroborate this statement, we observed during the "Data Inspection" phase that at least 50% of the productions were released between 2017 and 2021. Another interesting finding is that TV Shows appear to have become more trendy when it comes to Movies, as Movies ruled the majority of releases historically but as we approach 2021, TV Shows are nearly half of the total releases.

When did Netflix add the most productions?

What is the most Common duration for movies & for tv shows?

For Movies

We observe the majority of Movies are at least 1 hour (60 minutes) long, peaking at around 100 minutes, and very few movies make it past the 2.5 hour (150 minutes) mark.

For Movies

Most TV Shows have only 1 season. The amount of shows with 2 or more seasons starts to decrease exponentially (hence why we use a logarithmic scale in the graph above). Nevertheless we have some outliers such as a case of two TV Shows with 15 seasons and one with 17 seasons.

What are the most common production genres?

Interestingly, International Movies and International TV Shows are some of the most popular genres which refers to productions made outside the US having the predominant languange other than english. Aside from these 2 genres we have Dramas, Comedies, Documentaries, Action & Adventure as some of the most popular.

Since the US is the biggest producer of films, it might be an interesting exercise to see how the most popular genres differ in US productions (assuming the "International" genres should not exist for american films)

When isolating the United States, the genres Dramas, Comedies and Documentaries are still on the top of productions, but surprisingly Children & Family Movies are more common than Action & Adventure. TV Dramas seems to be particularly less popular when compared to global productions.

How are productions distributed across different ratings?

From the Unique rating values we found 3 instances of movies whose "duration" is in the rating column: '74 min', '84 min', '66 min'. The 3 instances in question belong to the director Louis C.K. and since it appears to be an error we will drop those instances for our rating analysis.

Most Productions are either TV-MA or TV-14 which implies most productions are not oriented for children.

It might be interesting to further investigate the breakdown of ratings, including to what segment of the population these productions most cater to (eg. Adults, teens or children)

Although adult restricted ratings are the most common, Netflix still has a considerable number of productions which can be viewed by the younger population segments:

Have new productions changed rating distribution across time?

The ratings of productions added vary greatly in the first years until 2014, which as we've seen were years of considerably less productions added than in the following years. From 2015 onward we see some repeating patterns in how the proportions of ratings behave: some changes are still observable which might reflect how netflix is adapting their released productions to different segments of their users. It might be interesting to plot the evolution of age groups to test this theory.

As the number of released productions rose from 2015 onwards, Netflix seemed to maintain a fairly stable proportion of ratings respective to the different segments of the population (adults, teens and children), yet there seems to be a sligthly rise in productions fit for teenagers to the detriment of adding less productions catered to children. In summary, in the past few years, Adult restricted productions make slightly less than half of added productions, followed by teenage admitted productions and then children admitted productions.

Conclusion

In conclusion, we found some key insights on the Netflix dataset: Most Netflix productions are movies and have been produced in the Northern Hemisphere (with the US being a major player); most productions have been both released and added in recent years (up until September 2021); Most movies are around 100 minutes and most TV Shows don't make it past a couple season; The most popular genres aside international movies and tv shows are Dramas, Comedies and Documentaries; Productions with the TV-MA & TV-14 are the most common and Netflix has remained fairly stable in the age groups the ratings of added productions cater to.

The dataset in question is fairly limited since despite holding information about the nature of Netflix's productions (namely TV-Shows and Movies), it lacks any user info such as how the users could rate each production, or how many times productions are viewed, as well as other potencial metrics and preferences. Nevertheless, one could argue that after this EDA it would be easier to find potential users that may show interest in productions based on preferences of genre, country of production, rating, duration or any of the other insights we found.